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(57) Abstract: A method and apparatus are provided for generat- 
ing, from an input set of documents, a word replaceability matrix 
defining semantic similarity between words occurring in the in- 
put document set. For each word, distinct word sequences of pre- 
determined length are identified from the documents of the set, 
each word sequence being indicative of the context in which the 
word was used and, according to the relative frequency of occur- 
rence of the identified word sequences for the word, fuzzy sets 
are generated for each word comprising membership values for 
corresponding groups of word sequences. For each pair of words 
occurring in the document set, their respective fuzzy sets are used 
to calculate the probability that the first word of a pair is seman- 
tically suitable as a replacement for the second word of the pair, 
these probabilities being collated to form a word similarity matrix 
for use in an improved method of determining document similar- 
ity and in information retrieval. 
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